Skip to content

[EP API] header-only adapter for EP API#26919

Open
fs-eire wants to merge 8 commits intomainfrom
fs-eire/ep-api-adatper-header-only
Open

[EP API] header-only adapter for EP API#26919
fs-eire wants to merge 8 commits intomainfrom
fs-eire/ep-api-adatper-header-only

Conversation

@fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Jan 6, 2026

Description

This PR adds a few headers for supporting building WebGPU EP and CUDA EP as plugin EPs.

See summary of #26907

@fs-eire fs-eire force-pushed the fs-eire/ep-api-adatper-header-only branch from 74b2ed4 to b8a9132 Compare February 2, 2026 23:35
@adrianlizarraga
Copy link
Contributor

@fs-eire, thank you for adding these utilities. We currently have some plugin EP utilities in this directory: https://github.com/microsoft/onnxruntime/tree/main/include/onnxruntime/core/providers/utils

And there's another on the way here: #25753

Could we consolidate all of these utilities into one location?


### Usage

Make sure to include "ep/_pch.h" for all source code in the implementation. Using PCH is recommended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using PCH is recommended.

does this mean something other than including "_pch.h"? it's not obvious to me.

perhaps I'm just not understanding the name. could you either explain it more here or give _pch.h a more descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_pch.h is the file, but we want to actually use compiler flags to specify the file as a PCH to enforce.

I probably should write Using PCH compiler flag is recommended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comments to make it clear.


#pragma once

#include "core/framework/allocator.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naive question - is this header intended to be used by a plugin EP which doesn't have access to internal ORT code? if so, how can we use internal ORT headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The whole purpose of the folder include/onnxruntime/ep/adapter is designed for a very specific goal: to support the existing EPs (majorly WebGPU EP and CUDA EP) to be able to use the EP API, with minimal code changes.

This means most of the existing code can be kept as-is. For some components, we need a "reverse bridge" to support an implementation of OrtAllocator which wraps an ORT internal allocator, like this.

@edgchen1
Copy link
Contributor

edgchen1 commented Feb 4, 2026

@fs-eire, thank you for adding these utilities. We currently have some plugin EP utilities in this directory: https://github.com/microsoft/onnxruntime/tree/main/include/onnxruntime/core/providers/utils

And there's another on the way here: #25753

Could we consolidate all of these utilities into one location?

it may be worth considering whether the utility is generally useful to any plugin EP vs. useful to plugin EPs within the ORT repo. some of these seem like the latter.

@fs-eire
Copy link
Contributor Author

fs-eire commented Feb 4, 2026

@fs-eire, thank you for adding these utilities. We currently have some plugin EP utilities in this directory: https://github.com/microsoft/onnxruntime/tree/main/include/onnxruntime/core/providers/utils
And there's another on the way here: #25753
Could we consolidate all of these utilities into one location?

it may be worth considering whether the utility is generally useful to any plugin EP vs. useful to plugin EPs within the ORT repo. some of these seem like the latter.

I agree. we should not have separated implementation in different places of the repo.

In general, I think there are 2 different types of headers:

  • the headers that ONLY depends on ORT public headers. This can include some useful common utils/macros and helper functions.
  • the headers also depending on ORT internal headers. Specifically, for the EP adapter implementation. (currently they are in include/onnxruntime/ep/adapter)

I would recommend to put them somewhere under include/onnxruntime/, because considering the usage header-only will be much more easier for users to use. And if it's header-only, it should be put inside include/onnxruntime/.

All other details are open to discussion.

auto* allocator = static_cast<const Allocator*>(this_ptr);
return allocator->memory_info_;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to double check that AllocOnStream is not also needed for cuda or webgpu EPs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always extend those classes to support more function that may be missing for CUDA. The classes are generally designed to be generic enough for being reused in a future migration for CUDA EP.

inline const DataTransferManager& GetDataTransferManager() const noexcept {
return data_transfer_manager_;
}
[[nodiscard]] Status GetTempSpaceCPUAllocator(AllocatorPtr* output) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTempSpaceCPUAllocator and GetTempSpaceAllocator are internally implemented by onnxruntime::OpKernelContext. Would it be possible/appropriate to add public C APIs to OrtKernelContext for these functions instead of including them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep it in the current way.

There are 2 reasons:

  • performance consideration. Here, onnxruntime::ep::adapter::EP::GetTempSpaceCPUAllocator accepts a AllocatorPtr*, which is a pointer to an internal type. We have access to the reference to the WebGpuExecutionProvider instance any way so it's not a problem of getting the actual allocator pointer returned by it.
    If we change that to use an allocator returned by C API, then we have to use onnxruntime::ep::adapter::Allocator to wrap it (which has some overhead).

  • for usage inside include/onnxruntime/ep/adapter, there is no point to do this as long as WebGPU EP creates the allocator and use it both without the C API.

return ToOrtStatus(status);
}
*out = nullptr;
status = kernel->CreateControlFlowKernelImpl(info, out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add a comment explaining why we try to create a control flow kernel first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, please review the comments.

@fs-eire fs-eire force-pushed the fs-eire/ep-api-adatper-header-only branch from b8a9132 to c9e4973 Compare February 7, 2026 04:58

### Usage

Make sure to include "ep/_pch.h" for all source code in the implementation. Using PCH compiler flag is recommended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the name _pch.h. maybe you can explain it to me. or could we name it something like ep/adapters.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using EP_SPECIFIC_USING_DECLARATIONS to "override" some ORT classes with our adapter classes. This requires the header file to be included before any header that defines the old ORT classes. The best way to achieve this is to use the PCH (pre-compiled header), which is guaranteed by the compiler to ensure the file is included first.

we do have some 'pch.h' in winml folder, a 'test_pch.h' in cmake folder, and a PCH for CUDA EP: onnxruntime\core\providers\cuda\cuda_pch.h. Using the filename containing 'pch' may be a good way to indicating that the file is used as PCH explicitly.

The reason why the file is not inside adapter folder is because it depends on the shared plugin EP headers. Before we refactored them we probably need to keep it because otherwise #include "../common.h" seems not a very good way.

@@ -0,0 +1,7 @@
## EP adapter

This folder contains a set of C++ header files. They are used specifically for allowing ONNX Runtime internal kernel-based EPs to use the plugin-style EP API while keep minimal changes to existing code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this readme is not located in the adapters directory. maybe move it there, or explicitly name the adapters directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section in the README to explain the folder structure. Since there is an ongoing discussion about unifying the shared headers, I expect the current folder structure to be refactored later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future, when we have a dedicated folder for the shared plugin EP headers, we can put everything in the adapter directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants